NSF PAR Search | NSF Public Access Repository

AxoNN: energy-aware execution of neural network inference on multi-accelerator heterogeneous SoCs

https://doi.org/10.1145/3489517.3530572

Dagli, Ismet; Cieslewicz, Alexander; McClurg, Jedidiah; Belviranli, Mehmet E. (July 2022, DAC '22: Proceedings of the 59th ACM/IEEE Design Automation Conference)

The energy and latency demands of critical workload execution, such as object detection, in embedded systems vary based on the physical system state and other external factors. Many recent mobile and autonomous System-on-Chips (SoC) embed a diverse range of accelerators with unique power and performance characteristics. The execution flow of the critical workloads can be adjusted to span into multiple accelerators so that the trade-off between performance and energy fits to the dynamically changing physical factors. In this study, we propose running neural network (NN) inference on multiple accelerators of an SoC. Our goal is to enable an energy-performance trade-off with an by distributing layers in a NN between a performance- and a power-efficient accelerator. We first provide an empirical modeling methodology to characterize execution and inter-layer transition times. We then find an optimal layers-to-accelerator mapping by representing the trade-off as a linear programming optimization constraint. We evaluate our approach on the NVIDIA Xavier AGX SoC with commonly used NN models. We use the Z3 SMT solver to find schedules for different energy consumption targets, with up to 98% prediction accuracy.

Full Text Available

Search for: All records